Scenario
By developing the Night time saftey Index, I aim to identify areas within melbourne that are considered high risk zones. Areas in which there is low visibility, little to no street lights, little foot traffic and is not near a public transport stop. Conversly areas will be mared as safer if there is higher visibility, more foot traffic as this acts as a percieved layer of saftey and access to public transport, as generally the likleyhood of people would be higher. The index is developed based on understanding crime data on low light and low populas areas.
With the use of the Night-time saftey index, we can strive to improve these areas improve the saftey index and build a more safe and inclusive city for all members of the community.
As a student who commutes to the city regularly for work and univesity, I end up often traveling after dark. I want to know which areas in melbourne are safe or dangerous at night. So that I can plan a safer routes, avoid areas that are dangerous and feel confident that I will be safe moving through the city at night.
At the end of this use case you will:
- Demonstraite the ability to retrieve and process data from public API
- Data cleaning and preprocessing techniques on geospatial and time data
- Perform basic aggregation and filtering methods
- Perform analysis using latitude and longitude data
- Implement data visualisation techniques
Introduction¶
This use case aims to develop a Night time saftey index by combining datasets from the City of Melbourne Open data project. By combing the data, it can be then assessed and visualised to demonstrate the varying saftey level for different areas during the night hours. The project goal is to disover areas with low visablity, low foot traffic or infrastructure to aid in future developmenet to create a safer melbourne for all. The analysis is drawn from the datasets below accessed via the Melbourne open data API. By combining combining these datasets the project aims to create a data driven solution to create a safer and inclusive city.
Dataset Links
- Bus Stops data link: https://data.melbourne.vic.gov.au/explore/dataset/bus-stops/api/
- Street Lighting data link: https://data.melbourne.vic.gov.au/explore/dataset/street-lights-with-emitted-lux-level-council-owned-lights-only/api/
- Pedestrian counting link: https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/api/
- Feautured light data link: https://data.melbourne.vic.gov.au/explore/dataset/feature-lighting-including-light-type-wattage-and-location/api/
Importing Required Libraries¶
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests
import os
from functools import reduce
import folium
from folium.plugins import HeatMap
Importing Datasets¶
Importing Dataset using API v2.1
def fetch_data(base_url, dataset, api_key, num_records=99, offset=0):
all_records = []
max_offset = 9900 # Maximum number of requests
while True:
# Maximum limit check
if offset > max_offset:
break
# Create API request URL
filters = f'{dataset}/records?limit={num_records}&offset={offset}'
url = f'{base_url}{filters}&api_key={api_key}'
# Start request
try:
result = requests.get(url, timeout=10)
result.raise_for_status()
records = result.json().get('results')
except requests.exceptions.RequestException as e:
raise Exception(f"API request failed: {e}")
if records is None:
break
all_records.extend(records)
if len(records) < num_records:
break
# Next cycle offset
offset += num_records
# DataFrame all data
df = pd.DataFrame(all_records)
return df
# Retrieve API key from environment variable
API_KEY = os.environ.get('MELBOURNE_API_KEY')
BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
Importing Council Owned Street light data¶
Several key tasks are completed to prepare the street light dataset for future analysis. Firstly the Dataset is retrieved the dataset is imported using the Melbourne Open Data v2.1 API, ensuring all the data is up to date. The first steps taken are Coordinate Extraction. In which the geo_point_2d row stores the lat and lon data in a dictionary format new coloumns are created with longitude and latitude both extract from geo_point_2d. The data is validated so that is ensured that is clean and ready for mapping. The rows are checked for if geo_point_2d is a valid dcitionary. Finally a prieview of the code is created with .head()
# Data set name
dataset_street_lighting = 'street-lights-with-emitted-lux-level-council-owned-lights-only'
street_light_df = fetch_data(BASE_URL, dataset_street_lighting, API_KEY)
#Create a new col longitude
street_light_df['longitude'] = street_light_df['geo_point_2d'].apply(lambda x: x['lon'] if isinstance(x, dict) else None)
#Create a new col latitude
street_light_df['latitude'] = street_light_df['geo_point_2d'].apply(lambda x: x['lat'] if isinstance(x, dict) else None)
print(street_light_df.head())
geo_point_2d \
0 {'lon': 144.98066700010136, 'lat': -37.8116730...
1 {'lon': 144.9806740000221, 'lat': -37.81162799...
2 {'lon': 144.9807329998954, 'lat': -37.81139499...
3 {'lon': 144.98073799991442, 'lat': -37.8113810...
4 {'lon': 144.9785690000381, 'lat': -37.81117199...
geo_shape prop_id name addresspt1 \
0 {'type': 'Feature', 'geometry': {'coordinates'... 0 None 0.0
1 {'type': 'Feature', 'geometry': {'coordinates'... 0 None 0.0
2 {'type': 'Feature', 'geometry': {'coordinates'... 0 None 0.0
3 {'type': 'Feature', 'geometry': {'coordinates'... 0 None 0.0
4 {'type': 'Feature', 'geometry': {'coordinates'... 0 None 0.0
xorg ext_id asset_clas label asset_type ... addresspt asset_subt xsource \
0 ESG 35350 None 0.684 None ... 0 None None
1 ESG 35364 None 0.196 None ... 0 None None
2 ESG 35436 None 3.715 None ... 0 None None
3 ESG 35440 None 2.835 None ... 0 None None
4 ESG 36934 None 3.03 None ... 0 None None
profile xdate xdrawing mcc_id roadseg_id longitude latitude
0 None 20140916 None 0 0 144.980667 -37.811673
1 None 20140916 None 0 0 144.980674 -37.811628
2 None 20140916 None 0 0 144.980733 -37.811395
3 None 20140916 None 0 0 144.980738 -37.811381
4 None 20140916 None 0 0 144.978569 -37.811172
[5 rows x 23 columns]
Data Quality Check: Invalid Geolocation and Missing Values¶
Description:
This cell performs essential data quality checks on the street_light_df DataFrame:
Invalid Geolocation Entries:
Filters and displays rows where thegeo_point_2dfield is not a dictionary. This helps identify malformed or inconsistent geolocation data.Dataset Summary:
Prints a concise summary of the DataFrame’s structure, including column types and non-null counts using.info().Missing Value Audit:
Outputs the count of missing (null) values in each column to assess data completeness before proceeding with analysis or modeling.
# Filter and display rows where geo_point_2d is NOT a dictionary
invalid_geo_rows = street_light_df[~street_light_df['geo_point_2d'].apply(lambda x: isinstance(x, dict))]
# Display
print(invalid_geo_rows[['geo_point_2d']])
print(street_light_df.info())
#Missing values in the dataset:
print(street_light_df.isnull().sum())
Empty DataFrame Columns: [geo_point_2d] Index: [] <class 'pandas.core.frame.DataFrame'> RangeIndex: 9999 entries, 0 to 9998 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geo_point_2d 9999 non-null object 1 geo_shape 9999 non-null object 2 prop_id 9999 non-null object 3 name 0 non-null object 4 addresspt1 9999 non-null object 5 xorg 9999 non-null object 6 ext_id 9999 non-null object 7 asset_clas 0 non-null object 8 label 9999 non-null object 9 asset_type 0 non-null object 10 easting 9999 non-null object 11 northing 9999 non-null object 12 str_id 9999 non-null object 13 addresspt 9999 non-null object 14 asset_subt 0 non-null object 15 xsource 0 non-null object 16 profile 0 non-null object 17 xdate 9999 non-null object 18 xdrawing 0 non-null object 19 mcc_id 9999 non-null object 20 roadseg_id 9999 non-null object 21 longitude 9999 non-null float64 22 latitude 9999 non-null float64 dtypes: float64(2), object(21) memory usage: 1.8+ MB None geo_point_2d 0 geo_shape 0 prop_id 0 name 9999 addresspt1 0 xorg 0 ext_id 0 asset_clas 9999 label 0 asset_type 9999 easting 0 northing 0 str_id 0 addresspt 0 asset_subt 9999 xsource 9999 profile 9999 xdate 0 xdrawing 9999 mcc_id 0 roadseg_id 0 longitude 0 latitude 0 dtype: int64
Overview of street_light_df:
- The dataset contains 9,999 entries and 23 columns.
- The structure was inspected using
.info(),.isnull().sum(), and filtering for invalid structured geolocation data.
Key Findings:
No Invalid Geolocation Format:
- The geo_point_2d column contains valid data for all 9,999 rows.
- The filter for non-dictionary types returned an empty DataFrame, suggesting all entries conform to the expected structure.
High Number of Empty Fields:
- Several columns such as name, asset_clas, asset_type, asset_subt, xsource, profile, and roadseg_id contain 0 non-null values.
- These columns are likely irrelevant or deprecated and will be removed.
Complete Geolocation Data:
- latitude and longitude columns have no missing values, confirming location data is fully populated.
Cleaning and initialising Safety Scores for Street Light Data¶
Description:
The cell prepare the street_light_df DataFrame for analysis by:
- Adding a new column named
safety_score, assigning a default value of 1 to all rows. - Removing unnecessary columns related to geospatial metadata, internal identifies and classification details to simplify the dataset and retain only relavant features for further analysis.
#Assigns value 1 to safety score for all longitude and latitude
street_light_df['safety_score'] = 1
street_light_df.drop(['prop_id', 'geo_point_2d','asset_type','addresspt','profile', 'geo_shape', 'name', 'addresspt1', 'xorg', 'ext_id', 'asset_clas', 'easting', 'northing', 'str_id', 'asset_subt', 'xsource', 'xdrawing', 'mcc_id', 'roadseg_id'], axis = 1, inplace = True)
street_light_df
| label | xdate | longitude | latitude | safety_score | |
|---|---|---|---|---|---|
| 0 | 0.684 | 20140916 | 144.980667 | -37.811673 | 1 |
| 1 | 0.196 | 20140916 | 144.980674 | -37.811628 | 1 |
| 2 | 3.715 | 20140916 | 144.980733 | -37.811395 | 1 |
| 3 | 2.835 | 20140916 | 144.980738 | -37.811381 | 1 |
| 4 | 3.03 | 20140916 | 144.978569 | -37.811172 | 1 |
| ... | ... | ... | ... | ... | ... |
| 9994 | 5.376 | 20140916 | 144.964910 | -37.811920 | 1 |
| 9995 | 4.594 | 20140916 | 144.966492 | -37.815075 | 1 |
| 9996 | 99.022 | 20140916 | 144.963164 | -37.807990 | 1 |
| 9997 | 99.316 | 20140916 | 144.963159 | -37.807973 | 1 |
| 9998 | 48.778 | 20140916 | 144.966164 | -37.814421 | 1 |
9999 rows × 5 columns
Geospatial Visualisation of Street Lights¶
Description:
The scatter plot below plots the geographical distribution of street lights across the mapped area. Each point represents a single street light. Visualising provides a spatial context of the spread of the data and aids in discovering the light coverage and potential areas which are underlit.
plt.figure(figsize=(12, 10))
plt.scatter(street_light_df['longitude'], street_light_df['latitude'], alpha=0.5, s=10
)
plt.title('Geographical Distribution of street_light')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
Overview of Geospatial visualisation:
The scatter plot has provided key insights into the datasets in particular the spread of the light datasets. As showen there are key areas that are missing that should be present when overlayed on a map of the city of melbourne. This takes into question the spread of the data what datasets are missing and what steps we cant take to account for the significant missing street light data.
Importing feature light data¶
The data is fetched using the API and then the rows are renamed to latitude and longitude to keep consistant nameing porfile throughout all datasets.Enabling seamless geospatial analysis and integration with other location-based data
# Data set name
feature_light_data = 'feature-lighting-including-light-type-wattage-and-location'
#Fetch data
feature_light_df = fetch_data(BASE_URL, feature_light_data, API_KEY)
#Rename columns latitude and longitude
feature_light_df.rename(columns={"lat": "latitude", "lon": "longitude"}, inplace= True)
feature_light_df
| asset_number | asset_description | lamp_type_lupvalue | lamp_rating_w | mounting_type_lupvalue | latitude | longitude | location | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1544260 | Feature Lighting - Birrarung Marr | 13.0 | 70.0 | Pole: Multiple Fixed | -37.818239 | 144.971382 | {'lon': 144.9713815748613, 'lat': -37.81823859... |
| 1 | 1541782 | Feature Lighting - | 13.0 | 35.0 | Pole: Multiple Fixed | -37.822848 | 144.947094 | {'lon': 144.94709354140863, 'lat': -37.8228478... |
| 2 | 1542772 | Feature Lighting - | 12.0 | NaN | Pole: Multiple Fixed | -37.823150 | 144.947204 | {'lon': 144.9472041813461, 'lat': -37.82314998... |
| 3 | 1346470 | Feature Lighting - Docklands | 1.0 | NaN | Canopy | -37.817318 | 144.952251 | {'lon': 144.95225109118593, 'lat': -37.8173181... |
| 4 | 1539337 | Feature Lighting - Newquay Promenade between S... | 9.0 | NaN | Pole: Multiple Fixed | -37.814603 | 144.942694 | {'lon': 144.94269431917522, 'lat': -37.8146026... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8559 | 1347738 | Feature Lighting - Docklands | 1.0 | 18.0 | Wall | -37.824620 | 144.946620 | {'lon': 144.94662008927858, 'lat': -37.8246201... |
| 8560 | 1541845 | Feature Lighting - | NaN | NaN | Pole: Multiple Fixed | -37.823748 | 144.952091 | {'lon': 144.9520910323129, 'lat': -37.82374757... |
| 8561 | 1346811 | Feature Lighting - Docklands | 3.0 | 36.0 | Parapet | -37.817528 | 144.950016 | {'lon': 144.95001579629215, 'lat': -37.8175284... |
| 8562 | 1544683 | Feature Lighting - Seafarers Rest | 2.0 | 14.0 | Pole: Multiple Fixed | -37.822771 | 144.951655 | {'lon': 144.95165526864113, 'lat': -37.8227706... |
| 8563 | 1542075 | Feature Lighting - Arglye Square | 9.0 | NaN | Pole: Multiple Fixed | -37.802565 | 144.966134 | {'lon': 144.9661338329207, 'lat': -37.80256495... |
8564 rows × 8 columns
Data Quality and Summary Statistics Overview¶
Description:
this code cell performs an intial analysis of the feature_light_df dataset to understand the completeness of the data and distribution.
- Missing Value Check:
- This is done with the use of
.isnull().sum(), this displays the totoal number of missing values in each column.
- This is done with the use of
- Descriptive Statistics:
- The
.describe()method provides statisical summary. This aids in udnerstanding the data distributiona and will help in finidng and strange data.
- The
- Data Structure Summary:
- the
.info()prints the data types, non-null counts and other data, helps in ensuring data types and further assists in finding feilds not much data.
- the
print("Missing values per column:")
print(feature_light_df.isnull().sum())
print(feature_light_df.describe())
print(feature_light_df.info())
Missing values per column:
asset_number 0
asset_description 0
lamp_type_lupvalue 1093
lamp_rating_w 4458
mounting_type_lupvalue 611
latitude 0
longitude 0
location 0
dtype: int64
asset_number lamp_type_lupvalue lamp_rating_w latitude \
count 8.564000e+03 7471.000000 4106.000000 8564.000000
mean 1.492630e+06 8.017401 52.039211 -37.818159
std 8.554725e+04 4.428364 53.874435 0.006589
min 1.346354e+06 1.000000 3.000000 -37.844649
25% 1.487632e+06 3.000000 14.000000 -37.822813
50% 1.540754e+06 9.000000 36.000000 -37.819194
75% 1.542894e+06 13.000000 70.000000 -37.815231
max 1.771404e+06 16.000000 500.000000 -37.786462
longitude
count 8564.000000
mean 144.952375
std 0.011364
min 144.921921
25% 144.944251
50% 144.947282
75% 144.963851
max 144.985432
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8564 entries, 0 to 8563
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 asset_number 8564 non-null int64
1 asset_description 8564 non-null object
2 lamp_type_lupvalue 7471 non-null float64
3 lamp_rating_w 4106 non-null float64
4 mounting_type_lupvalue 7953 non-null object
5 latitude 8564 non-null float64
6 longitude 8564 non-null float64
7 location 8564 non-null object
dtypes: float64(4), int64(1), object(3)
memory usage: 535.4+ KB
None
Overview of feature_light_df:
- The output provides a overview of the feature_light_df, column longitude and latitdue are fully populated providing a good foundation for further analysis.
Key Findings:
Complete Geolocation Data:
- latitude and longitude columns have no missing values, confirming location data is fully populated.
Missing Values:
- Several columns such as lamp_type_lupvalue: 1093 missing, lamp_rating_w: 4458 missing values and mounting_type_lupvalue: 611 missing values.
Descriptive Statistics:
- lamp_rating_w ranges from 3W to 500W with a median around 36W.
Visualisation of Feature light's distribution of Lamp Wattage¶
Description:
The bar plot below visualises the distribution of lampwattage this gives us a visual representage of the light wattage. This will aid in deciding on the weightage of the safety score based on the light wattage.
plt.figure(figsize=(10,6))
plt.hist(feature_light_df['lamp_rating_w'].dropna(), bins=30, edgecolor='black')
plt.title('Distribution of Lamp Wattages')
plt.xlabel('Wattage (W)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
Wattage Distribution Insight¶
Observation:
The majority of street lights in the feature_light_df dataset fall within the 0–100 watt range, with a few number exceeding 400 watts. With the distribution skewed to the left it suggests a intentional design strategy in urban lighting infrastructure assuming it is based on the density of the location.
Interpretation:
The relatively low wattage per individual light may reflect a higher density based method with a higher concentration of lights, as each light require less power to produce sufficient illumination. This enables:
* Better energy efficiency
* Reduced light pollution
*Even light distribution across high-density areas
Implication for Analysis:
When evaluating visibility within an area, it is essential to taken into concideration both individual lamp wattage and the spatial density of lighting.
Initialising safety_score for feature_light_df and Cleaning¶
Description:
The safety score is attatched to the feature_light_df with the assignement based on the lamp_rating_w this checkes the range of the wattage of each light and then assings a safety score.
As we can see below if the light is below 50w the rating is 1, if the light is below 100 we get a safety rating of 2, if the light is below 300w we get a saftey rating of 3 and for anything above we get a safety rating of 4. If there is a missing value we return 1 for the rating of the light. As it is most likely to be that of 50w or less.
Finally we can see below that
.drop()removes columns that are not used within the dataset so that it can be cleaned and be more clear.
feature_light_df['safety_score'] = feature_light_df['lamp_rating_w'].apply(
lambda x: 1 if x < 50 else 2 if x < 100 else 3 if x < 300 else 4 if x < 1000 else 1
)
feature_light_df.drop(['lamp_type_lupvalue','mounting_type_lupvalue', 'location'], axis= 1, inplace= True)
feature_light_df
| asset_number | asset_description | lamp_rating_w | latitude | longitude | safety_score | |
|---|---|---|---|---|---|---|
| 0 | 1544260 | Feature Lighting - Birrarung Marr | 70.0 | -37.818239 | 144.971382 | 2 |
| 1 | 1541782 | Feature Lighting - | 35.0 | -37.822848 | 144.947094 | 1 |
| 2 | 1542772 | Feature Lighting - | NaN | -37.823150 | 144.947204 | 1 |
| 3 | 1346470 | Feature Lighting - Docklands | NaN | -37.817318 | 144.952251 | 1 |
| 4 | 1539337 | Feature Lighting - Newquay Promenade between S... | NaN | -37.814603 | 144.942694 | 1 |
| ... | ... | ... | ... | ... | ... | ... |
| 8559 | 1347738 | Feature Lighting - Docklands | 18.0 | -37.824620 | 144.946620 | 1 |
| 8560 | 1541845 | Feature Lighting - | NaN | -37.823748 | 144.952091 | 1 |
| 8561 | 1346811 | Feature Lighting - Docklands | 36.0 | -37.817528 | 144.950016 | 1 |
| 8562 | 1544683 | Feature Lighting - Seafarers Rest | 14.0 | -37.822771 | 144.951655 | 1 |
| 8563 | 1542075 | Feature Lighting - Arglye Square | NaN | -37.802565 | 144.966134 | 1 |
8564 rows × 6 columns
Analysis Summary¶
Safety Score Implementation:
The safety_score has been successfully computed and assigned based on the lamp_rating_w values, reflecting the relative illumination capacity of each feature light.
Data Cleaning:
Unnecessary columns such as lamp_type_lupvalue, mounting_type_lupvalue, and location have been removed from feature_light_df to streamline the dataset and focus on relevant variables.
Outcome:
The dataset is now clean, well-structured, and ready for further exploratory analysis or integration with other spatial and pedestrian datasets.
Importing city circle tram stop data¶
Several key tasks are completed to prepare tram stop dataset for future analysis. Firstly the Dataset is retrieved the dataset is imported using the Melbourne Open Data v2.1 API, ensuring all the data is up to date. The first steps taken are Coordinate Extraction. In which the geo_point_2d row stores the lat and lon data in a dictionary format new coloumns are created with longitude and latitude both extract from geo_point_2d. The data is validated so that is ensured that is clean and ready for mapping. The rows are checked for if geo_point_2d is a valid dcitionary. Finally a prieview of the code is created with .head()
# Data set name
tram_stops = 'city-circle-tram-stops'
# Fetch dataset
city_circle_tram_stops = fetch_data(BASE_URL, tram_stops, API_KEY)
# Create a new row named latitude
city_circle_tram_stops['latitude'] = city_circle_tram_stops['geo_point_2d'].apply(lambda x: x['lat'] if isinstance(x, dict) else None)
# Create a new row named longitude
city_circle_tram_stops['longitude'] = city_circle_tram_stops['geo_point_2d'].apply(lambda x: x['lon'] if isinstance(x, dict) else None)
# Create a new row named longitude
print(city_circle_tram_stops.head())
geo_point_2d \
0 {'lon': 144.95786314283018, 'lat': -37.8202377...
1 {'lon': 144.95546153614245, 'lat': -37.8209726...
2 {'lon': 144.95109855638137, 'lat': -37.8219046...
3 {'lon': 144.95644059700524, 'lat': -37.8117714...
4 {'lon': 144.95891745116262, 'lat': -37.8110592...
geo_shape \
0 {'type': 'Feature', 'geometry': {'coordinates'...
1 {'type': 'Feature', 'geometry': {'coordinates'...
2 {'type': 'Feature', 'geometry': {'coordinates'...
3 {'type': 'Feature', 'geometry': {'coordinates'...
4 {'type': 'Feature', 'geometry': {'coordinates'...
name xorg stop_no mccid_str xsource \
0 Melbourne Aquarium / Flinders Street GIS Team 2 None Mapbase
1 Spencer Street / Flinders Street GIS Team 1 None Mapbase
2 The Goods Shed / Wurundjeri Way GIS Team D5 None Mapbase
3 William Street / La Trobe Street GIS Team 3 None Mapbase
4 Queen Street / La Trobe Street GIS Team 4 None Mapbase
xdate mccid_int latitude longitude
0 2011-10-18 4 -37.820238 144.957863
1 2011-10-18 5 -37.820973 144.955462
2 2011-10-18 7 -37.821905 144.951099
3 2011-10-18 16 -37.811771 144.956441
4 2011-10-18 17 -37.811059 144.958917
Data Quality and Summary Statistics Overview¶
Description:
this code cell performs an intial analysis of the city_circle_tram_stops dataset to understand the completeness of the data and distribution.
- Missing Value Check:
- This is done with the use of
.isnull().sum(), this displays the totoal number of missing values in each column.
- This is done with the use of
- Descriptive Statistics:
- The
.describe()method provides statisical summary. This aids in udnerstanding the data distributiona and will help in finidng and strange data.
- The
- Data Structure Summary:
- the
.info()prints the data types, non-null counts and other data, helps in ensuring data types and further assists in finding feilds not much data.
- the
print(city_circle_tram_stops.describe())
print("Missing data")
print(city_circle_tram_stops.isnull().sum())
print(city_circle_tram_stops.info())
latitude longitude count 28.000000 28.000000 mean -37.814679 144.959475 std 0.004571 0.010734 min -37.822157 144.938646 25% -37.818407 144.951395 50% -37.814528 144.960143 75% -37.810883 144.969134 max -37.807603 144.974534 Missing data geo_point_2d 0 geo_shape 0 name 0 xorg 0 stop_no 0 mccid_str 28 xsource 0 xdate 0 mccid_int 0 latitude 0 longitude 0 dtype: int64 <class 'pandas.core.frame.DataFrame'> RangeIndex: 28 entries, 0 to 27 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geo_point_2d 28 non-null object 1 geo_shape 28 non-null object 2 name 28 non-null object 3 xorg 28 non-null object 4 stop_no 28 non-null object 5 mccid_str 0 non-null object 6 xsource 28 non-null object 7 xdate 28 non-null object 8 mccid_int 28 non-null object 9 latitude 28 non-null float64 10 longitude 28 non-null float64 dtypes: float64(2), object(9) memory usage: 2.5+ KB None
Overview of city_circle_tram_stops:
- The output provides a overview of the city_circle_tram_stops, column longitude and latitdue are fully populated providing a good foundation for further analysis.
Key Findings:
Complete Geolocation Data:
- latitude and longitude columns have no missing values, confirming location data is fully populated.
Missing Values:
- mccid_str is missing 28 data values everythin else seems to have no null values
Descriptive Statistics:
- there are 28 entries of data points
Safety Score assignment and Cleaning¶
Description:
The code cell below provides a safety score of 3 to all tram stops as each one is of equal importance. As there are not many tram stops within the city loop these stops tend to be places of interest. It is key to understand the locations tend to be high traffic and will be utilised by many to be transported throughout the city.
the dataset is then cleaned by dropping unwanted columns
city_circle_tram_stops['safety_score'] = 3
city_circle_tram_stops.drop(['geo_point_2d', 'geo_shape', 'xorg', 'mccid_str', 'xsource', 'xdate', 'mccid_int'], axis= 1, inplace=True)
city_circle_tram_stops
| name | stop_no | latitude | longitude | safety_score | |
|---|---|---|---|---|---|
| 0 | Melbourne Aquarium / Flinders Street | 2 | -37.820238 | 144.957863 | 3 |
| 1 | Spencer Street / Flinders Street | 1 | -37.820973 | 144.955462 | 3 |
| 2 | The Goods Shed / Wurundjeri Way | D5 | -37.821905 | 144.951099 | 3 |
| 3 | William Street / La Trobe Street | 3 | -37.811771 | 144.956441 | 3 |
| 4 | Queen Street / La Trobe Street | 4 | -37.811059 | 144.958917 | 3 |
| 5 | Swanston Street / La Trobe Street | 6 | -37.809619 | 144.963850 | 3 |
| 6 | Russell Street / La Trobe Street | 7 | -37.808877 | 144.966345 | 3 |
| 7 | Parliament / Collins Street | 8 | -37.813581 | 144.974064 | 3 |
| 8 | Swanston Street / Flinders Street | 5 | -37.817632 | 144.966905 | 3 |
| 9 | Elizabeth Street / Flinders Street | 4 | -37.818324 | 144.964479 | 3 |
| 10 | Docklands Park / Harbour Esplanade | D4 | -37.822157 | 144.947733 | 3 |
| 11 | Bourke Street / Harbour Esplanade | D3 | -37.818656 | 144.946508 | 3 |
| 12 | Waterfront City / Docklands Drive | D11 | -37.814465 | 144.938646 | 3 |
| 13 | Spencer Street / La Trobe Street | 1 | -37.813181 | 144.951494 | 3 |
| 14 | Elizabeth Street / La Trobe Street | 5 | -37.810354 | 144.961369 | 3 |
| 15 | Victoria Street / La Trobe Street | 9 | -37.807603 | 144.970701 | 3 |
| 16 | Nicholson Street / Victoria Parade | 10 | -37.808011 | 144.973104 | 3 |
| 17 | Albert Street / Nicholson Street | 10 | -37.809562 | 144.972914 | 3 |
| 18 | Russell Street / Flinders Street | 6 | -37.816673 | 144.970156 | 3 |
| 19 | Market Street / Flinders Street | 3 | -37.819223 | 144.961401 | 3 |
| 20 | Victoria Police Centre / Flinders Street | D6 | -37.821539 | 144.953569 | 3 |
| 21 | Central Pier / Harbour Esplanade | D2 | -37.815427 | 144.945121 | 3 |
| 22 | New Quay Promenade / Docklands Drive | D10 | -37.813415 | 144.941378 | 3 |
| 23 | Etihad Statium / La Trobe Street | D1 | -37.814592 | 144.946551 | 3 |
| 24 | King Street / La Trobe Street | 2 | -37.812488 | 144.953935 | 3 |
| 25 | Exhibition Street / La Trobe Street | 8 | -37.808149 | 144.968793 | 3 |
| 26 | Spring Street / Flinders Street | 8 | -37.815389 | 144.974534 | 3 |
| 27 | Exhibition Street / Flinders Street | 7 | -37.816145 | 144.971969 | 3 |
City circle tram stops saftey score is allocated based on the fact that the city circle is quite a busy area with high foot traffic and it allows for easy pathway throughout the city many of which people can you to travel directly home or to a train station to get said individual home. As the city circle is always well lit and has a large foot traffic each stop has been allocated a safety score of 3 which is quite high.
Importing Pedestrian Counting data¶
Several key tasks are completed to prepare the Pedestrian counting dataset for future analysis. Firstly the Dataset is retrieved the dataset is imported using the Melbourne Open Data v2.1 API, ensuring all the data is up to date. The first steps taken are Coordinate Extraction. In which the geo_point_2d row stores the lat and lon data in a dictionary format new coloumns are created with longitude and latitude both extract from geo_point_2d. The data is validated so that is ensured that is clean and ready for mapping. The rows are checked for if geo_point_2d is a valid dcitionary. Finally a prieview of the code is created with .head()
# Data set name
dataset_pedestrian_counting = 'pedestrian-counting-system-monthly-counts-per-hour'
# Fetch dataset
pedestrian_counting_df = fetch_data(BASE_URL, dataset_pedestrian_counting, API_KEY)
# Create a new row named latitude
pedestrian_counting_df['latitude'] = pedestrian_counting_df['location'].apply(lambda x: x['lat'] if isinstance(x, dict) else None)
# Create a new row named longitude
pedestrian_counting_df['longitude'] = pedestrian_counting_df['location'].apply(lambda x: x['lon'] if isinstance(x, dict) else None)
pedestrian_counting_df.head(20)
| id | location_id | sensing_date | hourday | direction_1 | direction_2 | pedestriancount | sensor_name | location | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 671120211218 | 67 | 2021-12-18 | 11 | 236 | 248 | 484 | FLDegS_T | {'lon': 144.96562569, 'lat': -37.81688755} | -37.816888 | 144.965626 |
| 1 | 121320240523 | 12 | 2024-05-23 | 13 | 188 | 164 | 352 | NewQ_T | {'lon': 144.94292398, 'lat': -37.81457988} | -37.814580 | 144.942924 |
| 2 | 1371820250124 | 137 | 2025-01-24 | 18 | 11 | 37 | 48 | BouHbr2353_T | {'lon': 144.94612292, 'lat': -37.81894815} | -37.818948 | 144.946123 |
| 3 | 1661720250421 | 166 | 2025-04-21 | 17 | 65 | 71 | 136 | Spen484_T | {'lon': 144.94931703, 'lat': -37.80896733} | -37.808967 | 144.949317 |
| 4 | 531220210901 | 53 | 2021-09-01 | 12 | 144 | 131 | 275 | Col254_T | {'lon': 144.965499, 'lat': -37.81564191} | -37.815642 | 144.965499 |
| 5 | 6320250215 | 6 | 2025-02-15 | 3 | 42 | 48 | 90 | FliS_T | {'lon': 144.96558255, 'lat': -37.81911705} | -37.819117 | 144.965583 |
| 6 | 581620220329 | 58 | 2022-03-29 | 16 | 394 | 566 | 960 | Bou688_T | {'lon': 144.95358075, 'lat': -37.81686075} | -37.816861 | 144.953581 |
| 7 | 5920230903 | 5 | 2023-09-03 | 9 | 234 | 328 | 562 | PriNW_T | {'lon': 144.96787656, 'lat': -37.81874249} | -37.818742 | 144.967877 |
| 8 | 49420240116 | 49 | 2024-01-16 | 4 | 16 | 19 | 35 | Eli501_T | {'lon': 144.95956055, 'lat': -37.80730068} | -37.807301 | 144.959561 |
| 9 | 592320220507 | 59 | 2022-05-07 | 23 | 88 | 30 | 118 | RMIT_T | {'lon': 144.96304859, 'lat': -37.80825648} | -37.808256 | 144.963049 |
| 10 | 622020221229 | 62 | 2022-12-29 | 20 | 152 | 61 | 213 | Lat224_T | {'lon': 144.96216521, 'lat': -37.80996494} | -37.809965 | 144.962165 |
| 11 | 581220241007 | 58 | 2024-10-07 | 12 | 800 | 385 | 1185 | Bou688_T | {'lon': 144.95358075, 'lat': -37.81686075} | -37.816861 | 144.953581 |
| 12 | 49320230809 | 49 | 2023-08-09 | 3 | 14 | 20 | 34 | Eli501_T | {'lon': 144.95956055, 'lat': -37.80730068} | -37.807301 | 144.959561 |
| 13 | 1071920241116 | 107 | 2024-11-16 | 19 | 104 | 104 | 208 | 280Will_T | {'lon': 144.95690188, 'lat': -37.81246271} | -37.812463 | 144.956902 |
| 14 | 72520220712 | 72 | 2022-07-12 | 5 | 8 | 13 | 21 | ACMI_T | {'lon': 144.96872809, 'lat': -37.81726338} | -37.817263 | 144.968728 |
| 15 | 312020220519 | 31 | 2022-05-19 | 20 | 160 | 185 | 345 | Lyg161_T | {'lon': 144.96658911, 'lat': -37.80169681} | -37.801697 | 144.966589 |
| 16 | 512320230805 | 51 | 2023-08-05 | 23 | 29 | 50 | 79 | Fra118_T | {'lon': 144.95906316, 'lat': -37.80841815} | -37.808418 | 144.959063 |
| 17 | 771520220419 | 77 | 2022-04-19 | 15 | 26 | 6 | 32 | HarEsP_T | {'lon': 144.94433026, 'lat': -37.81441438} | -37.814414 | 144.944330 |
| 18 | 1401820231207 | 140 | 2023-12-07 | 18 | 129 | 122 | 251 | Boyd2837_T | {'lon': 144.96185972, 'lat': -37.82590962} | -37.825910 | 144.961860 |
| 19 | 68820250328 | 68 | 2025-03-28 | 8 | 168 | 146 | 314 | FLDegN_T | {'lon': 144.96559789, 'lat': -37.8168479} | -37.816848 | 144.965598 |
Exploritory Pedestrian Activity Patterns¶
The data is investigated to develop a better understanding on the data and to build better insights into the statistics. To understand the foot traffic across Melbourne at various points in the day, I analyse the Pedestrian counting system dataset, the sensor provides hourly pedestrain counts caputed through sensors placed in various locations throughout the city. The first step that is carried out is the maximum and minimum pedestrian volumes. This allows me to identify the busiest and the quietest recored counts, including their exact location and hour of the day the data was collected. date_time is a sorted time seried by sensing_date, hourday and location_id to allow for time trend analysis which time are busier and which are not. Finally the data is grouped by sensor_name, sensing_date and hourday to be implemented in more advance data aggregation.
max_count = pedestrian_counting_df['pedestriancount'].max()
min_count = pedestrian_counting_df['pedestriancount'].min()
busiest_loc = pedestrian_counting_df.loc[pedestrian_counting_df['pedestriancount'].idxmax()][['latitude', 'longitude', 'hourday']]
quietest_loc = pedestrian_counting_df.loc[pedestrian_counting_df['pedestriancount'].idxmin()][['latitude', 'longitude', 'hourday']]
date_time = pedestrian_counting_df.sort_values(['sensing_date', 'hourday', 'location_id'])
group_ped_data = pedestrian_counting_df.groupby(['sensor_name', 'sensing_date', 'hourday'])
print(pedestrian_counting_df.info())
print(date_time)
print(max_count)
print(min_count)
print(busiest_loc)
print(quietest_loc)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 9999 non-null int64
1 location_id 9999 non-null int64
2 sensing_date 9999 non-null object
3 hourday 9999 non-null int64
4 direction_1 9999 non-null int64
5 direction_2 9999 non-null int64
6 pedestriancount 9999 non-null int64
7 sensor_name 9999 non-null object
8 location 9999 non-null object
9 latitude 9999 non-null float64
10 longitude 9999 non-null float64
dtypes: float64(2), int64(6), object(3)
memory usage: 859.4+ KB
None
id location_id sensing_date hourday direction_1 \
4175 52620210701 52 2021-07-01 6 10
712 12720210701 12 2021-07-01 7 26
9702 20920210701 20 2021-07-01 9 29
9043 191120210701 19 2021-07-01 11 225
5921 671820210701 67 2021-07-01 18 171
... ... ... ... ... ...
2644 121720250515 12 2025-05-15 17 208
3027 531920250515 53 2025-05-15 19 264
5249 722020250515 72 2025-05-15 20 104
3744 2020250516 2 2025-05-16 0 5
8654 31120250516 31 2025-05-16 1 12
direction_2 pedestriancount sensor_name \
4175 13 23 Eli263_T
712 86 112 NewQ_T
9702 33 62 LtB170_T
9043 146 371 LtB210_T
5921 231 402 FLDegS_T
... ... ... ...
2644 295 503 NewQ_T
3027 157 421 Col254_T
5249 217 321 ACMI_T
3744 18 23 Bou283_T
8654 10 22 Lyg161_T
location latitude longitude
4175 {'lon': 144.9619401, 'lat': -37.81252157} -37.812522 144.961940
712 {'lon': 144.94292398, 'lat': -37.81457988} -37.814580 144.942924
9702 {'lon': 144.9682466, 'lat': -37.81172914} -37.811729 144.968247
9043 {'lon': 144.96550671, 'lat': -37.81237202} -37.812372 144.965507
5921 {'lon': 144.96562569, 'lat': -37.81688755} -37.816888 144.965626
... ... ... ...
2644 {'lon': 144.94292398, 'lat': -37.81457988} -37.814580 144.942924
3027 {'lon': 144.965499, 'lat': -37.81564191} -37.815642 144.965499
5249 {'lon': 144.96872809, 'lat': -37.81726338} -37.817263 144.968728
3744 {'lon': 144.96516718, 'lat': -37.81380668} -37.813807 144.965167
8654 {'lon': 144.96658911, 'lat': -37.80169681} -37.801697 144.966589
[9999 rows x 11 columns]
4900
0
latitude -37.81458
longitude 144.942924
hourday 21
Name: 4041, dtype: object
latitude -37.824018
longitude 144.956044
hourday 4
Name: 8766, dtype: object
Safety Score Assignment¶
Description:
The data is first grouped by
location_id,latitudeandlongitudeto discover the thepedestriancountwhich sums the total value from all times of a day to provide a respresentative value to determine how busy each of the pedestrian count locations get at all times of the day. This method was implemented as it emplores a more accurate representation of which locations are high traffic at peak times as well. Which is important datat to incorportate.safety_scorecolumn is created and the values are assined on the values frompedestrian_countcolumn. The safety score is determined on the foot_traffic at a specific location if there are less than 25000 people within a day the value is set to 1 if it is less then 50000 the value is 2 if it is less than 100000 than the value is 3 and less than 150000 is 4 for everything else the value is 5 as it exceeds the 150000 threshold.
# Group and sum pedestrian counts
pedestrian_count_location = pedestrian_counting_df.groupby(['location_id', 'latitude', 'longitude'])['pedestriancount'].sum().reset_index()
# Add safety score based on pedestrian count
pedestrian_count_location['safety_score'] = pedestrian_count_location['pedestriancount'].apply(
lambda x: 1 if x < 25000 else 2 if x < 50000 else 3 if x < 100000 else 4 if x < 150000 else 5
)
pedestrian_count_location.head(100)
| location_id | latitude | longitude | pedestriancount | safety_score | |
|---|---|---|---|---|---|
| 0 | 1 | -37.813494 | 144.965153 | 24025 | 1 |
| 1 | 2 | -37.813807 | 144.965167 | 30509 | 2 |
| 2 | 3 | -37.811015 | 144.964295 | 38775 | 2 |
| 3 | 4 | -37.814880 | 144.966088 | 39926 | 2 |
| 4 | 5 | -37.818742 | 144.967877 | 19821 | 1 |
| ... | ... | ... | ... | ... | ... |
| 93 | 167 | -37.813041 | 144.951560 | 2206 | 1 |
| 94 | 179 | -37.823924 | 144.962997 | 668 | 1 |
| 95 | 180 | -37.794971 | 144.935303 | 233 | 1 |
| 96 | 181 | -37.810095 | 144.961431 | 4486 | 1 |
| 97 | 182 | -37.816275 | 144.955505 | 1484 | 1 |
98 rows × 5 columns
We can clearly see the assigned safety score value as per locaaiton id in order.
Geospatial Visualisation of Pedestrian Counting sensors¶
Description:
-The code cell below plots all the locaitons of all sensors using matplotlib. It demonstrates the spread of the sensors throughout Melbourne
plt.figure(figsize=(12, 10))
plt.scatter(pedestrian_counting_df['longitude'],pedestrian_counting_df['latitude'], alpha=0.5, s=10)
plt.title('Geographical Distribution of Pedestrian count')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
As seen with the results we cans see that there is a good spread of pedestrian count sensors throought the city. This is great news as it will ensure that we can reliably use the sensor data.
Visualisation of Pedestrian Counts by Location¶
Description:
-The code cell below plots a bar plot of all the sensor locaitons using matplotlib. This will provide us with the visual representation of the more frequented locations within melbourne.
pedestrian_count_location_d = pedestrian_counting_df.groupby('location_id')['pedestriancount'].sum()
pedestrian_count_location_d.plot(kind='bar', figsize=(16, 8), color='purple')
plt.title("Pedestrian Counts by Location")
plt.xlabel("Location ID")
plt.xticks(rotation = 90)
plt.tight_layout()
plt.ylabel("Pedestrians count")
plt.show()
From the Graph:
- we can see the location id in which there is the greatest foot traffic present
- we can also see the locations in which the least amount of foot traffic exists
- some stand out location id are ID 35, 24, 41,47,59, 66, 84 for the largest amount of foot traffic in either direction
- low foot traffic regions can be see as ID 10, 44, 46, 51, 71, 75, 76, 78, 118 etc
Importing Landmark Data¶
Several key tasks are completed to prepare landmark dataset for future analysis. Firstly the Dataset is retrieved it is imported using the Melbourne Open Data v2.1 API, ensuring all the data is up to date. The first steps taken are Coordinate Extraction. In which the lat and lon data in a dictionary format new coloumns are created with longitude and latitude both extract from co_ordinates. The data is validated so that is ensured that is clean and ready for mapping. The rows are checked for if co_ordinates is a valid dcitionary. Finally a prieview of the code is created with .head(). unique landmarks is printed to see the different landmarks which are incorrportate into the dataset. implemented describe to understand the dataset for any EDA.
# Fetch dataset
landmarks = fetch_data(BASE_URL, 'landmarks-and-places-of-interest-including-schools-theatres-health-services-spor' , API_KEY)
landmarks['latitude'] = landmarks['co_ordinates'].apply(lambda x: x['lat'] if isinstance(x, dict) else None)
landmarks['longitude'] = landmarks['co_ordinates'].apply(lambda x: x['lon'] if isinstance(x, dict) else None)
unique_landmarks = landmarks['theme'].drop_duplicates()
print(unique_landmarks)
print(landmarks.describe())
print(landmarks.head())
# Create a new row named latitude
0 Transport
1 Mixed Use
2 Leisure/Recreation
3 Place of Worship
13 Health Services
14 Community Use
18 Place Of Assembly
21 Office
30 Purpose Built
31 Vacant Land
35 Education Centre
46 Residential Accommodation
59 Warehouse/Store
112 Specialist Residential Accommodation
179 Retail
225 Industrial
Name: theme, dtype: object
latitude longitude
count 242.000000 242.000000
mean -37.812141 144.961306
std 0.012365 0.017296
min -37.848520 144.908191
25% -37.821102 144.953549
50% -37.813219 144.965589
75% -37.803816 144.972796
max -37.781268 144.989401
theme sub_theme \
0 Transport Railway Station
1 Mixed Use Retail/Office/Carpark
2 Leisure/Recreation Informal Outdoor Facility (Park/Garden/Reserve)
3 Place of Worship Church
4 Place of Worship Church
feature_name \
0 Flemington Bridge Railway Station
1 Council House 2 (CH2)
2 Carlton Gardens South
3 Wesley Church
4 St Augustines Church
co_ordinates latitude longitude
0 {'lon': 144.939277838304, 'lat': -37.788164588... -37.788165 144.939278
1 {'lon': 144.966638432727, 'lat': -37.814259143... -37.814259 144.966638
2 {'lon': 144.971266479841, 'lat': -37.806068457... -37.806068 144.971266
3 {'lon': 144.968168215633, 'lat': -37.810157644... -37.810158 144.968168
4 {'lon': 144.954862000132, 'lat': -37.816974135... -37.816974 144.954862
The EDA provides useful information as the number of unique landmarks which exist this will provide useful when implementing the adequate saftey score based on the landmark. Furthermore, we get a good scope into the data there are 242 data points. These datapoints will provide useful to help with the lack of street light data within the city.
Visualising Landmarks¶
Viewing the landmarks on the map gives a good respresentation of the provided landmarks and the locations in which each land marks exist. The map contains unique Icons for each landmark and if you click on the icon it provides the description of the landmark.
# Map centered on Melbourne
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
# Assigned appropriate icons to each landmark theme
theme_icons = {
"Transport": ("train", "blue"),
"Mixed Use": ("building", "gray"),
"Leisure/Recreation": ("tree", "green"),
"Place of Worship": ("university", "purple"),
"Health Services": ("plus-square", "red"),
"Community Use": ("users", "darkblue"),
"Place Of Assembly": ("paint-brush", "darkpurple"),
"Office": ("briefcase", "lightgray"),
"Purpose Built": ("cogs", "cadetblue"),
"Vacant Land": ("ban", "black"),
"Education Centre": ("graduation-cap", "orange"),
"Residential Accommodation": ("home", "lightgreen"),
"Warehouse/Store": ("archive", "beige"),
"Specialist Residential Accommodation": ("bed", "lightred"),
"Retail": ("shopping-cart", "pink"),
"Industrial": ("industry", "darkred")
}
#iterate through the landmarks dataframe
for _, row in landmarks.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
info_text = row['feature_name']
theme = row['theme']
icon_name, colour = theme_icons.get(theme, ("info-sign", "gray"))
folium.Marker(
location=[row['latitude'], row['longitude']],
popup= info_text,
icon= folium.Icon(icon=icon_name, prefix= "fa", color=colour)
).add_to(m)
# adding legends with HTML
legend_html = """
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
<div style="position: fixed;
bottom: 50px; left: 50px; width: 300px; height: 400px;
background-color: rgba(255, 255, 255, 0.8); border:2px solid grey; z-index:1000; font-size:14px;
padding: 10px;">
<b>Legend</b><br><br>
<i class="fa fa-train" style="color:blue"></i> Transport<br>
<i class="fa fa-building" style="color:gray"></i> Mixed Use<br>
<i class="fa fa-tree" style="color:green"></i> Leisure/Recreation<br>
<i class="fa fa-university" style="color:purple"></i> Place of Worship<br>
<i class="fa fa-plus-square" style="color:red"></i> Health Services<br>
<i class="fa fa-users" style="color:darkblue"></i> Community Use<br>
<i class="fa fa-paint-brush" style="color:darkpurple"></i> Place Of Assembly<br>
<i class="fa fa-briefcase" style="color:lightgray"></i> Office<br>
<i class="fa fa-cogs" style="color:cadetblue"></i> Purpose Built<br>
<i class="fa fa-ban" style="color:black"></i> Vacant Land<br>
<i class="fa fa-graduation-cap" style="color:orange"></i> Education Centre<br>
<i class="fa fa-home" style="color:lightgreen"></i> Residential Accommodation<br>
<i class="fa fa-archive" style="color:beige"></i> Warehouse/Store<br>
<i class="fa fa-bed" style="color:lightred"></i> Specialist Residential Accommodation<br>
<i class="fa fa-shopping-cart" style="color:pink"></i> Retail<br>
<i class="fa fa-industry" style="color:darkred"></i> Industrial<br>
</div>
"""
m.get_root().html.add_child(folium.Element(legend_html))
# Show map
m
What can we see¶
- The spread of the key landmarks within melbourne
- What landmarks exisit within the city
- Better understanding on what steps to take to assign landmark score values
Landmark score¶
Below the landmark saftey score is assigned to each of the key landmarks within the dataset. The scoring is determined by the type of land use and its impact on pedestrian activity, lighting and surveillance. Locations such as health services, education centres and community hubs have received an increased score due to the frequent foot traffic and established infrastructure present in those locations. On the other hand locations such as vacant lands, warehouses and industrial zones are assined lower scores due to lower pedestrian activity, reduced visibility and underutilisation. Furthermore these areas during the night are often not visited and remain empty. This scoring enables us to quantify various landmarks and how they contribute or detract from the night time safety in Melbourne.
#The landmark score is a dictionary with the key as the landmark theme and score as the safety rating provided.
landmark_score = {
'Transport': 2,
'Health Services': 3,
'Education Centre': 2,
'Place of Worship': 2,
'Leisure/Recreation': 1,
'Community Use': 3,
'Residential Accommodation': 2,
'Specialist Residential Accommodation': 1,
'Retail': 2,
'Vacant Land': -1,
'Office' : 1,
'Warehouse/Store': 0,
'Mixed Use': 1,
'Purpose Built': 1,
'Industrial': 0,
'Place Of Assembly': 2
}
#The score is then mapped to the landmarks data with a new column named safety_score with the appropriate score.
landmarks['safety_score'] = landmarks['theme'].map(landmark_score).fillna(0)
landmarks
| theme | sub_theme | feature_name | co_ordinates | latitude | longitude | safety_score | |
|---|---|---|---|---|---|---|---|
| 0 | Transport | Railway Station | Flemington Bridge Railway Station | {'lon': 144.939277838304, 'lat': -37.788164588... | -37.788165 | 144.939278 | 2 |
| 1 | Mixed Use | Retail/Office/Carpark | Council House 2 (CH2) | {'lon': 144.966638432727, 'lat': -37.814259143... | -37.814259 | 144.966638 | 1 |
| 2 | Leisure/Recreation | Informal Outdoor Facility (Park/Garden/Reserve) | Carlton Gardens South | {'lon': 144.971266479841, 'lat': -37.806068457... | -37.806068 | 144.971266 | 1 |
| 3 | Place of Worship | Church | Wesley Church | {'lon': 144.968168215633, 'lat': -37.810157644... | -37.810158 | 144.968168 | 2 |
| 4 | Place of Worship | Church | St Augustines Church | {'lon': 144.954862000132, 'lat': -37.816974135... | -37.816974 | 144.954862 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 237 | Education Centre | School - Primary and Secondary Education | Melbourne Girls Grammar School | {'lon': 144.985089428348, 'lat': -37.831536451... | -37.831536 | 144.985089 | 2 |
| 238 | Retail | Department Store | Myer | {'lon': 144.963855087868, 'lat': -37.813591198... | -37.813591 | 144.963855 | 2 |
| 239 | Retail | Department Store | David Jones | {'lon': 144.964373486798, 'lat': -37.813312726... | -37.813313 | 144.964373 | 2 |
| 240 | Health Services | Medical Services | Mercy Private Hospital | {'lon': 144.984435746587, 'lat': -37.811896809... | -37.811897 | 144.984436 | 3 |
| 241 | Mixed Use | Retail/Office/Carpark | ANZ 'Gothic' Bank | {'lon': 144.961673719242, 'lat': -37.816158066... | -37.816158 | 144.961674 | 1 |
242 rows × 7 columns
Another row is attached named saftey_score in which it stores the net saftey score fo each location determened on the theme of the location the location weight has been assigned as per documentation of crime events within melbourne and areas which are perceived safe due to there increased foot traffic and use at all hours of the day. Office where scored lower as are often empty after hours. This data will aid in provided a much needed information to incorporate into the night_time safety index.
Visualisation of Safety Score Distribution¶
Description: The code cell implements a Bar graph of the Saftey Score for each landmark, with a clear visual representation of the spread of the data based ont he landmark.
#EDA on safety score spread
safety_score_theme = landmarks.groupby('theme')['safety_score'].mean()
safety_score_theme.plot(kind='bar', figsize=(14, 7), color='purple')
plt.title("Distribution of final score", fontsize = 18)
plt.xlabel("Landmark")
plt.ylabel("Safety score")
plt.xticks(rotation=60)
plt.tight_layout()
plt.grid(True)
plt.show()
Areas that are low value are Warehouse/store and industrial with a value of 0. With the only negative score assigned to vacant land.
Importing bus stop data¶
Several key tasks are completed to prepare the bus stop dataset for future analysis. Firstly the Dataset is retrieved the dataset is imported using the Melbourne Open Data v2.1 API, ensuring all the data is up to date. The first steps taken are Coordinate Extraction. In which the geo_point_2d row stores the lat and lon data in a dictionary format new coloumns are created with longitude and latitude both extract from geo_point_2d. The data is validated so that is ensured that is clean and ready for mapping. The rows are checked for if geo_point_2d is a valid dcitionary. Finally a prieview of the code is created with .head()
# Data set names
dataset_bus_stops = 'bus-stops'
# Fetch dataset
bus_stops_df = fetch_data(BASE_URL, dataset_bus_stops, API_KEY)
# Create a new row named latitude
bus_stops_df['latitude'] = bus_stops_df['geo_point_2d'].apply(lambda x: x['lat'] if isinstance(x, dict) else None)
# Create a new row name longitude
bus_stops_df['longitude'] = bus_stops_df['geo_point_2d'].apply(lambda x: x['lon'] if isinstance(x, dict) else None)
print(bus_stops_df.describe())
print(bus_stops_df.head())
prop_id addresspt1 addressp_1 objectid str_id \
count 309.000000 309.000000 309.000000 309.000000 3.090000e+02
mean 6405.006472 25.802489 175.258900 23327.242718 1.296812e+06
std 58324.056187 20.458442 109.574787 13112.345496 1.110742e+05
min 0.000000 0.000000 0.000000 303.000000 1.231165e+06
25% 0.000000 10.980840 88.000000 12390.000000 1.239533e+06
50% 0.000000 21.561304 175.000000 22943.000000 1.249163e+06
75% 0.000000 35.066244 268.000000 35532.000000 1.257190e+06
max 627016.000000 98.326608 360.000000 44401.000000 1.581811e+06
mcc_id roadseg_id latitude longitude
count 3.090000e+02 309.000000 309.000000 309.000000
mean 1.296812e+06 21305.511327 -37.810139 144.953007
std 1.110742e+05 3107.476239 0.015279 0.019688
min 1.231165e+06 0.000000 -37.850563 144.900324
25% 1.239533e+06 20563.000000 -37.821684 144.945702
50% 1.249163e+06 21680.000000 -37.807816 144.957667
75% 1.257190e+06 22386.000000 -37.798203 144.966767
max 1.581811e+06 30708.000000 -37.776878 144.987731
geo_point_2d \
0 {'lon': 144.96889648633675, 'lat': -37.8184248...
1 {'lon': 144.95888238475013, 'lat': -37.8176759...
2 {'lon': 144.95963193312105, 'lat': -37.7818891...
3 {'lon': 144.94716743007305, 'lat': -37.7937265...
4 {'lon': 144.92778487963457, 'lat': -37.8028616...
geo_shape prop_id addresspt1 \
0 {'type': 'Feature', 'geometry': {'coordinates'... 573333 29.149053
1 {'type': 'Feature', 'geometry': {'coordinates'... 0 10.537902
2 {'type': 'Feature', 'geometry': {'coordinates'... 0 25.269643
3 {'type': 'Feature', 'geometry': {'coordinates'... 0 44.230506
4 {'type': 'Feature', 'geometry': {'coordinates'... 0 67.718553
addressp_1 asset_clas asset_type objectid str_id \
0 288 Signage Sign - Public Transport 749 1249454
1 105 Signage Sign - Public Transport 2098 1247042
2 212 Signage Sign - Public Transport 2143 1252383
3 237 Signage Sign - Public Transport 2627 1249788
4 360 Signage Sign - Public Transport 3306 1235311
addresspt asset_subt model_desc mcc_id roadseg_id \
0 606816 None Sign - Public Transport 1 Panel 1249454 0
1 507646 None Sign - Public Transport 1 Panel 1247042 20118
2 108510 None Sign - Public Transport 1 Panel 1252383 22387
3 100087 None Sign - Public Transport 1 Panel 1249788 20919
4 103002 None Sign - Public Transport 1 Panel 1235311 21680
descriptio model_no latitude \
0 Sign - Public Transport 1 Panel Bus Stop Type 12 P.16 -37.818425
1 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16 -37.817676
2 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16 -37.781889
3 Sign - Public Transport 1 Panel Bus Stop Type 8 P.16 -37.793727
4 Sign - Public Transport 1 Panel Bus Stop Type 13 P.16 -37.802862
longitude
0 144.968896
1 144.958882
2 144.959632
3 144.947167
4 144.927785
Safety score assignment and Cleaning¶
bus_stop_simplifiedis created an incorporates key elements of bus_stops_df, whilst disregarding unnecessary information. Then a safety score of 2 assigned to a new column safety score in which the same score is assigned througout the dataset.
As bus stops are often higher traffic area and are often located in wide open spaces the rating is 2.
bus_stops_simplified = bus_stops_df[['objectid', 'latitude', 'longitude']].copy()
bus_stops_simplified['safety_score'] = 2
print(bus_stops_simplified)
objectid latitude longitude safety_score 0 749 -37.818425 144.968896 2 1 2098 -37.817676 144.958882 2 2 2143 -37.781889 144.959632 2 3 2627 -37.793727 144.947167 2 4 3306 -37.802862 144.927785 2 .. ... ... ... ... 304 44096 -37.799877 144.950054 2 305 44103 -37.822043 144.961272 2 306 44170 -37.846370 144.984817 2 307 44287 -37.810347 144.961123 2 308 44401 -37.797553 144.974887 2 [309 rows x 4 columns]
Geospatial Visualisation of Bus Stops¶
Description:
The code cell below plots all the location of the bus stops within the bus_stops_df as a scatter plot. This will server as a visual representation of the spread of the data and aid in decisions based on the
bus stop datasets.
plt.figure(figsize=(12, 10))
plt.scatter(bus_stops_df['longitude'],bus_stops_df['latitude'], alpha=0.5, s=10)
plt.title('Geographical Distribution of Bus Stops')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
As seen by the scatter plot the spread demonstrates how well distributed the bus stop data is.
Visualising Dataset using folium¶
I have iterated through each of the datasets and coloured the elements respectfully blue: bus stops, red: pedestrian count sensor location, yellow: street lights. This provides a visual guid to understand the location of each of the street lights, location of each of the bus stops and the location of each of the pedestrain sensors with this informaiton we can get a better understanding on how to further progress on the project what data is scarse and what areas need more data sources to produce reliable information.
# Center map on Melbourne
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
# Add markers from the dataset bus_stops_df
for _, row in bus_stops_df.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=2,
color='blue',
fill=True
).add_to(m)
# Add markers for the city cirlcle tram stop
for _, row in city_circle_tram_stops.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=4,
color='green',
fill=True
).add_to(m)
# Add markers from the dataset pedestrian_counting_df
for _, row in pedestrian_counting_df.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=3,
color='red',
fill=True
).add_to(m)
# Add markers from the dataset street_lighting_df
for _, row in street_light_df.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=2,
color='yellow',
fill=True
).add_to(m)
# Add markers from the dataset feature_light_df
for _, row in feature_light_df.iterrows():
if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=2,
color='yellow',
fill=True
).add_to(m)
# Add legend
legend_html = """
<div style="position: fixed;
bottom: 50px; left: 50px; width: 150px; height: 120px;
background-color: rgba(255, 255, 255, 0.8); border:2px solid grey; z-index:1000; font-size:12px;
padding: 10px;">
<b>Legend</b><br>
<i style="background:blue; width:10px; height:10px; display:inline-block; border-radius:50%;"></i> Bus Stops<br>
<i style="background:green; width:10px; height:10px; display:inline-block; border-radius:50%;"></i> Tram Stops<br>
<i style="background:red; width:10px; height:10px; display:inline-block; border-radius:50%;"></i> Pedestrian Counters<br>
<i style="background:yellow; width:10px; height:10px; display:inline-block; border-radius:50%;"></i> Street Lights<br>
</div>
"""
m.get_root().html.add_child(folium.Element(legend_html))
# Show map
m